Evaluating Speech Separation Systems
نویسنده
چکیده
Common evaluation standards are critical to making progress in any field, but they can also distort research by shifting all the attention to a limited subset of the problem. Here, we consider the problem of evaluating algorithms for speech separation and acoustic scene analysis, noting some weaknesses of existing measures, and making some suggestions for future evaluations. We take the position that the most relevant ‘ground truth’ for sound mixture organization is the set of sources perceived by human listeners, and that best evaluation standards would measure the machine’s match to this perception at a level abstracted away from the low-level signal features most often considered in signal processing.
منابع مشابه
Joint optimization of recurrent networks exploiting source auto-regression for source separation
In music interferences condition, source separation is very difficult. In this paper, we propose a novel recurrent network exploiting the auto-regressions of speech and music interference for source separation. An auto-regression can capture the shortterm temporal dependencies in data to help the source separation. For the separation, we independently separate the magnitude spectra of speech an...
متن کاملFirst-order Differential Beamforming and Joint-process Estimation for Spatial Source Separation
Speech Enhancement is a technique required to grant the success of speech recognition systems working under strong noisy conditions, and to grant understandability in speech transmission and coding. Array beamforming has been traditionally used to produce improvements in the signal-to-noise ratio. Two-sensor systems based on FirstOrder Differential Beamformers (FODB) have been proposed as a pro...
متن کاملSingle channel speech separation in modulation frequency domain based on a novel pitch range estimation method
Computational Auditory Scene Analysis (CASA) has been the focus in recent literature for speech separation from monaural mixtures. The performance of current CASA systems on voiced speech separation strictly depends on the robustness of the algorithm used for pitch frequency estimation. We propose a new system that estimates pitch (frequency) range of a target utterance and separates voiced por...
متن کاملApplication of Over-complete Blind so Automatic Speech Re
Spoken dialogue based information retrieval systems that are used in mobile environments are becoming popular. However, mobile environment is dynamically changing and there exists many interfering signals. These two effects result in degradation in automatic speech recognition (ASR) accuracy and hence, degradation in performance of spoken dialogue based information retrieval systems. One way to...
متن کاملTasNet: time-domain audio separation network for real-time, single-channel speech separation
Robust speech processing in multi-talker environments requires effective speech separation. Recent deep learning systems have made significant progress toward solving this problem, yet it remains challenging particularly in real-time, short latency applications. Most methods attempt to construct a mask for each source in time-frequency representation of the mixture signal which is not necessari...
متن کامل